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Abstract 

Traditional databases commonly support efficient query and update procedures that 
operate in time which is sublinear in the size of the database. Our goal in this paper is 
to take a first step toward dynamic reasoning in probabilistic databases with comparable 
efficiency. We propose a dynamic data structure that supports efficient algorithms for 
updating and querying singly connected Bayesian networks. In the conventional algorithm, 
new evidence is absorbed in time 0(1) and queries are processed in time 0(N), where N 
is the size of the network. We propose an algorithm which, after a preprocessing phase, 
allows us to answer queries in time 0(logN) at the expense of 0(logN) time per evidence 
absorption. The usefulness of sub-linear processing time manifests itself in applications 
requiring (near) real-time response over large probabilistic databases. We briefly discuss a 
potential application of dynamic probabilistic reasoning in computational biology. 



1. Introduction 

Probabilistic (Bayesian) networks are an increasingly popular modeling technique that has 
been used successfully in numerous applications of intelligent systems such as real-time plan- 
ning and navigation, model-based diagnosis, information retrieval, classification, Bayesian 
forecasting, natural language processing, computer vision, medical informatics and compu- 
tational biology. Probabilistic networks allow the user to describe the environment using 
a "probabilistic database" that consists of a large number of random variables, each corre- 
sponding to an important parameter in the environment. Some random variables could in 
fact be hidden and may correspond to some unknown parameters (causes) that influence 
the observable variables. Probabilistic networks are quite general and can store information 
such as the probability of failure of a particular component in a computer system, the prob- 
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ability of page i in a computer cache being requested in the near future, the probability 
of a document being relevant to a particular query, or the probability of an amino-acid 
subsequence in a protein chain folding into an alpha-helix conformation. 

The applications we have in mind include networks that are dynamically maintained to 
keep track of a probabilistic model of a changing system. For instance, consider the task of 
automated detection of power-plant failures. We might repeat a cycle that consists of the 
following sequence of operations: First we perform sensing operations. These operations 
cause updates to be performed to specific variables in the probabilistic database. Based on 
this evidence we estimate (query) the probability of failure in certain sites. More precisely, 
we query the probability distribution of the random variables that measure the probability 
of failure in these sites based on the evidence. Since the plant requires constant monitoring, 
we must repeat the cycle of sense/evaluate on a frequent basis. 

A conventional (non-probabilistic) database tracking the plant's state would not be 
appropriate here, because it is not possible to directly observe whether a failure is about 
to occur. On the other hand, a probabilistic "database" based on a Bayesian network 
will only be useful if the operations — update and query — can be performed very quickly. 
Because real-time or near real-time is so often necessary, the question of doing extremely 
fast reasoning in probabilistic networks is important. 

Traditional (non-probabilistic) databases support efficient query and update procedures 
that often operate in time which is sublinear in the size of the database (e.g., using bi- 
nary search). Our goal in this paper is to take a step toward systems that can perform 
dynamic probabilistic reasoning (such as what is the probability of an event given a set of 
observations) in time which is sublinear in the size of the probabilistic network. Typically, 
sublinear performance in complex networks is attained by using parallelism. This paper 
relies on preprocessing. 

Specifically, we describe new algorithms for performing queries and updates in belief 
networks in the form of trees (causal trees, polytrees and join trees). We define two natural 
database operations on probabilistic networks. 

1. Update-Node: Perform sensory input, modify the evidence at a leaf node (single 
variable) in the network and absorb this evidence into the network. 

2. Query-Node: Obtain the marginal probability distribution over the values of an 
arbitrary node (single variable) in the network. 

The standard algorithms introduced by Pearl (1988) can perform the Query-Node oper- 
ation in 0(1) time although evidence absorption, i.e., the Update-Node operation, takes 
0(N) time where N is the size of the network. Alternatively, one can assume that the 
Update-Node operation takes 0(1) time (by simply recording the change) and the Query- 
Node operation takes 0(N) time (evaluating the entire network). 

In this paper we describe an approach to perform both queries and updates in O(log A) 
time. This can be very significant in some systems since we improve the ability of a system to 
respond after a change has been encountered from 0(A) time to O(log A). Our approach is 
based on preprocessing the network using a form of node absorption in a carefully structured 
way to create a hierarchy of abstractions of the network. Previous uses of node absorption 
techniques were reported by Peot and Shachter (1991). 
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We note that measuring complexity only in terms of the size of the network, N , can 
overlook some important factors. Suppose that each variable in the network has domain 
size k or less. For many purposes, k can be considered constant. Nevertheless, some of the 
algorithms we consider have a slowdown which is some power of k, which can be become 
significant in practice unless N is very large. Thus we will be careful to state this slowdown 
where it exists. 

Section 2 considers the case of causal trees, i.e., singly connected networks in which each 
node has at most one parent. The standard algorithm (see Pearl, 1988) must use O(k^N) 
time for either updates or for retrieval, although one of these operations can be done in 
0(1) time. As we discuss briefiy in Section 2.1, there is also a straightforward variant on 
this algorithm that takes O(k^D) time for both queries and updates, where D is the height 
of the tree. 

We then present an algorithm that takes 0(A;"^logiV) time for updates and 0(A;^logiV) 
time for queries in any causal tree. This can of course represent a tremendous speedup, 
especially for large networks. Our algorithm begins with a polynomial-time preprocessing 
step (linear in the size of the network), constructing another data structure (which is not 
itself a probabilistic tree) that supports fast queries and updates. The techniques we use are 
motivated by earlier algorithms for dynamic arithmetic trees, and involve "caching" suffi- 
cient intermediate computations during the update phase so that querying is also relatively 
easy. We note, however, that there are substantial and interesting differences between the 
algorithm for probabilistic networks and those for arithmetic trees. In particular, as will be 
apparent later, computation in probabilistic trees requires both bottom-up and top-down 
processing, whereas arithmetic trees need only the former. Perhaps even more interest- 
ing is that the relevant probabilistic operations have a different algebraic structure than 
arithmetic operations (for instance, they lack distributivity). 

Bayesian trees have many applications in the literature including classification. For 
instance, one of the most popular methods for classification is the Bayes classifier that 
makes independence assumption on the features that are used to perform classification 
(Duda & Hart, 1973; Rachlin, Kasif, Salzberg, & Aha, 1994). Probabilistic trees have 
been used in computer vision (Hel-Or & Werman, 1992; Chelberg, 1990), signal processing 
(Wilsky, 1993), game playing (Delcher & Kasif, 1992), and statistical mechanics (Berger 
& Ye, 1990). Nevertheless, causal trees are fairly limited for modeling purposes. However 
similar structures, called join trees, arise in the course of one of the standard algorithms for 
computing with arbitrary Bayesian networks (see Lauritzen and Spiegelhalter, 1988). Thus 
our algorithm for join trees has potential relevance to many networks that are not trees. 
Because join trees have some special structure, they allow some optimization of the basic 
causal-tree algorithm. We elaborate on this in Section 5. 

In Section 6 we consider the case of arbitrary polytrees. We give an O(logA) algo- 
rithm for updates and queries, which involves transforming the polytree to a join tree, and 
then using the results of Sections 2 and 5. The join tree of a polytree has a particularly 
simple form, giving an algorithm in which updates take 0(A;*'"'""^ log A) time and queries 
0(A;''"'"^ log A), where p is the maximum number of parents of any node. Although the 
constant appears large, it must be noted that the original polytree takes O(kP'^^N) space 
merely to represent, if conditional probability tables are given as explicit matrices. 
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Figure 1: A segment of a causal tree. 



Finally, we discuss a specific modelling application in computational biology where prob- 
abilistic models are used to describe, analyze and predict the functional behavior of biolog- 
ical sequences such as protein chains or DNA sequences (see Delcher, Kasif, Goldberg, and 
Hsu, 1993 for references). Much of the information in computational biology databases is 
noisy. However, a number of successful attempts to build probabilistic models have been 
made. In this case, we use a probabilistic tree of depth 300 that consists of 600 nodes and all 
the matrices of conditional probabilities are 2x2. The tree is used to model the dependence 
of a protein's secondary structure on its chemical structure. The detailed description of the 
problem and experimental results are given by Delcher et al. (1993). For this problem we 
obtain an effective speed-up of about a factor of 10 to perform an update as compared to the 
standard algorithm. Clearly, getting an order of magnitude improvement in the response 
time of a probabilistic real-time system could be of tremendous importance in future use of 
such systems. 

2. Causal Trees 

A probabilistic causal tree is a directed tree in which each node represents a discrete random 
variable X, and each directed edge is annotated by a matrix of conditional probabilities 
My\x (associated with edge X Y). That is, if a; is a possible value of X, and y of Y, 
then the (a;,y)th component of My\x is PriY = y\X = x). Such a tree represents a joint 
probability distribution over the product space of all variables; for detailed definitions and 
discussion see Pearl (1988). Briefiy, the idea is that we consider the product, over all nodes, 
of the conditional probability of the node given its parents. For example, in Figure 1 the 
implied distribution is: 

Pr(U = u,V = v,X = x,Y = y,Z = z) = 

Pr{U = u) Pr{V = v\U = u) Pr{X = x\U = u) Pr{Y = y\X = x) Pr(Z = z\X = x). 

Given particular values of u,v,x,y,z, the conditional probabilities can be read from the 
appropriate matrices M . One advantage of such a product representation is that it is very 
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concise. In this example, we need four matrices and the unconditional probability over U , 
but the size of each is at most the square of the largest variable's domain size. In contrast, 
a general distribution over N variables requires an exponential (in N) representation. 

Of course, not every distribution can be represented as a causal tree. But it turns out 
that the product decomposition implied by the tree corresponds to a particular pattern 
of conditional independencies which often hold (if perhaps only approximately) in real 
applications. Intuitively speaking, in Figure 1 some of these implied independencies are 
that the conditional probability of U given V , X ,Y and Z depends only on values of V and 
X; and the probability of Y given U, V, X, and Z depends only on X. Independencies of 
this sort can arise for many reasons, for instance from a causal modeling of the interactions 
between the variables. We refer the reader to Pearl (1988) for details related to the modeling 
of independence assumptions using graphs. 

In the following, we make several assumptions that significantly simplify the presenta- 
tion, but do not sacrifice generality. First, we assume that each variable ranges over the 
same, constant, number of values k} It follows that the marginal probability distribution 
for each variable can be viewed as a A;- dimensional vector, and each conditional probability 
matrix such as My\x is a square k X k matrix. A common case is that of binary random 
variables (k = 2); the distribution over the values (TRUE, FALSE) is then (p, 1 — p) for 
some probability p. 

The next assumption is that the tree is binary, and complete, so that each node has 
or 2 children. Any tree can be converted into this form, by at most doubling the number 
of nodes. For instance, suppose node p has children ci,C2,C3 in the original tree. We can 
create another "copy" of p, p', and rearrange the tree such that the two children of p are 
ci and p', and the two children of p' are C2 and C3. We can constrain p' always to have the 
same value as p simply by choosing the identity matrix for the conditional probability table 
between p and p'. Then the distribution represented by the new tree is effectively the same 
as the original. Similarly, we can always add "dummy" leaf nodes if necessary to ensure a 
node has two children. As explained in the introduction, we are interested in processes in 
which certain variables' values are observed, upon which we wish to condition. Our final 
assumption is that these observed evidence nodes are all leaves of the tree. Again, because 
it is possible to "copy" nodes and to add dummy nodes, this is not restrictive. 

The product distribution alluded to above corresponds to the distribution over variables 
prior to any observations. In practice, we are more interested in the conditional distribution, 
which is simply the result of conditioning on all the observed evidence (which, by the earlier 
assumption, corresponds to seeing values for all the leaf nodes). Thus, for each non-leaf node 
X we are interested in the conditional marginal probability over X, i.e., the A;- dimensional 
vector: 

Bel(X) = Pr(X|all evidence values). 

The main algorithmic problem is to compute Bel(X) for each (non-evidence) node X 
in the tree given the current evidence. It is well known that the probability vector Bel(X) 
can be computed in linear time (in the size of the tree) by a popular algorithm based on 

1. This assumption is nonrestrictive because we can add "dummy" values to each variable's range, which 
should be given conditional probability 0. Nevertheless, there may some computational advantage in 
allowing different variable domain sizes. The changes required to permit this are not difficult, but since 
they complicate the presentation somewhat we omit them. 
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the following equation: 

Bel(X) = Pr(X|all evidence) = a * A(X) * 7r(X) 

Here a is a normalizing constant, A(X) is the probability of all the evidence in the subtree 
below node X given X, and vr(X) is the probability of X given all evidence in the rest of the 
tree. To interpret this equation, note that if X = (xi, X2, ■ ■ ■ , Xk) and (Y = yi,y2, ■ ■ ■ , yk) 
are two vectors we define * to be the operation of component-wise product (pairwise or 
dyadic product of vectors): 

X *Y = (xiyi,X2y2, ■ ..,Xkyk)- 

The usefulness of A(X) and vr(X) derives from the fact that they can be computed recur- 
sively, as follows: 

1. If X is the root node, vr(X) is the prior probability of X . 

2. If X is a leaf node, A(X) is a vector with 1 in the ith position (where the ith value 
has been observed) and elsewhere. If no value for X has been observed, then A(X) 
is a vector consisting of all I's.^ 

3. Otherwise, if, as shown in Figure 1, the children of node X are Y and Z, its sibling 
is V and its parent is U , we have: 

A(X) = {My\x-HY))*{Mz\x-KZ)) 
7r(X) = M],^u ■ {<U) * {Mv\u ■ HV))) 

Our presentation of this technique follows that of Pearl (1988). However, we use a 
somewhat different notation in that we don't describe messages sent to parents or succes- 
sors, but rather discuss the direct relations among the vr and A vectors in terms of simple 
algebraic equations. We will take advantage of algebraic properties of these equations in 
our development. 

It is very easy to see that the equations above can be evaluated in time proportional to 
the size of the network. The formal proof is given by Pearl (1988). 

Theorem 1: The belief distribution of every variable (that is, the marginal probability 
distribution for each variable, given the evidence) in a causal tree can be evaluated in 
O(k^N) time where N is the size of the tree. (The factor is due to the multiplication 
of a matrix by a vector that must be performed at each node.) 

This theorem shows that it is possible to perform evidence absorption in 0(N) time, and 
queries in constant time (i.e., by retrieving the previously computed values from a lookup 
table). In the next sections we will show how to perform both queries and updates in 
worst-case O(logX) time. Intuitively, we will not recompute all the marginal distributions 
after an update, but rather make only a small number of changes, sufficient, however, to 
compute the value of any variable with only a logarithmic delay. 

2. Or we can set to 1 all components corresponding to possible values — this is especially useful when the 
observed variable is part of a joint-tree clique (Section 5). In general, A(X) should be thought of as the 
likelihood vector over X given our observations about X . 
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2.1 A Simple Preprocessing Approach 

To obtain intuition about the new approach we begin with a very simple observation. 
Consider a causal tree T of depth D. For each node X in the tree we initially compute its 
A(X) vector, vr vectors are left uncomputed. Given an update to a node Y, we calculate the 
revised A(X) vectors for all nodes X that are ancestors of Y in the tree. This clearly can be 
done in time proportional to the depth of the tree, i.e., 0(D). The rest of the information 
in the tree remains unchanged. Now consider a Query-Node operation for some node V 
in the tree. We obviously already have the accurate X(V) vector for every node in the tree 
including V . However, in order to compute its vr(F) vector we need to compute only the 
7r(y) vectors for all the nodes above V in the tree and multiply these by the appropriate 
A vectors that are kept current. This means that to compute the accurate vr(F) vector we 
need to perform 0(D) work as well. Thus, in this approach we don't perform the complete 
update to every A(X) and vr(X) vector in the tree. 

Lemma 2: Update-Node and Query-Node operations in a causal tree T can be per- 
formed in O(k^D) time where D is the depth of the tree. 

This implies that if the tree is balanced, both operations can be done in O(logiV) 
time. However, in some important applications the trees are not balanced (e.g., models of 
temporal sequences, Delcher et al., 1993). The obvious question therefore is: Given a causal 
tree T can we produce an equivalent balanced tree T'? While the answer to this question 
appears to be difficult, it is possible to use a more sophisticated approach to produce a data 
structure (which is not a causal tree) to process queries and updates in O(logiV) time. This 
approach is described in the subsequent sections. 

2.2 A Dynamic Data Structure For Causal Trees 

The data structure that will allow efficient incremental processing of a probabilistic tree T = 
To will be a sequence of trees, To,Ti,T2, . . . ,Ti, . . . , Tiogjv. Each Tj+i will be a contracted 
version of Tj-, whose nodes are a subset of those in Ti. In particular, Tj+i will contain about 
half as many leaves as its predecessor. 

We defer the details of this contraction process until the next section. However, one key 
idea is that we maintain consistency, in the sense that Bel(X), A(X), and vr(X) are given 
the same values by all the trees in which X appears. We choose the conditional probability 
matrices in the contracted trees (i.e., all trees other than Tq) to ensure this. 

Recall that the A and vr equations have the form 

A(X) = (My\x-HY))*(Mz\x-KZ)) 
7r(X) = Ml^u ■ [7r(U) * (Mv\u ■ HV))) 

if Y and Z are children of X , X is a right child of U , and V is X's sibling (Figure 1). 
However, these equations are not in the most convenient form and the following notational 
conventions will be very helpful. First, let Ai(x) (resp., Bi(x)) denote the conditional 
probability matrix between X and X's left (resp., right) child in the tree Ti. Note that the 
identity of these children can differ from tree to tree, because some of X's original children 
might be removed by the contraction process. One advantage of the new notation is that 
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Figure 2: The effect of tfie operation Rake (e, x). e must be a ieaf, but z may or may not 
be a ieaf. 

tlie expiicit dependence on tlie identity of tlie cliildren is suppressed. Next, suppose X's 
parent in Ti is u. Then we let Ci(x) denote eitlier Ai(u) or Bi(u), and Di(x) denote eitlier 
Bi{uY or Ai{uY , depending on wlietlier X is tlie right or left child, respectively, of U . It 
will not be necessary to keep careful track of these correspondences, but simply to note that 
the above equations become:"^ 

\{x) = A,{x)-\{y)*B,{x)-\{z) 
'k{x) = Di(x) ■ (Tr(u) * Ci(x) ■ A(f )) 

In the next section we describe the preprocessing step that creates the dynamic data 
structure. 

2.3 Rake Operation 

The basic operation used to contract the tree is Rake which removes both a leaf and its 
parent from the tree. The effect of this operation on the tree is shown in Figure 2. We 
now define the algebraic effect of this operation on the equations associated with this tree. 
Recall that we want to define the conditional probability matrices in the raked tree so that 
the distribution over the remaining variables is unchanged. We achieve this by substituting 
the equations for X(x) and 7r(a;) into the equations for A(m), 1^(2), and 7r(f ). In the following, 
it is important to note that 7r(M), X(z) and A(f ) are unaffected by the rake operation. 

In the following, let Diag„ denote the diagonal matrix whose diagonal entries are the 
components of the vector a. We derive the algebraic effect of the rake operation as follows: 

X(u) = A^(u) ■ X(v) * B^(u) ■ X(x) 

= A,{u) ■ X{v) * B,{u) ■ {A,{x) ■ A(e) * B,{x) ■ X{z)) 

= A,{u) ■ X{v) * B,{u) ■ (T)i&gA,{x)-\(e) ■ B^{x) ■ X{z)) 

= A,(u) ■ X(v) * (^B,(u) ■ Diag^^(^).;,(,) • B,(x)^ ■ X(z) 

= A,+i (u) ■ X{v) * B,+i (u) ■ X{z) 

where Ai^i(u) = Ai(u) and Bi^i(u) = Bi(u) ■ Diag^^(^).;^(g-j • Bi(x). (Of course, the case 
where the leaf being raked is a right child generates analogous equations.) Thus, by defining 

3. Throughout, we assume that * has lower precedence than matrix multiplication (indicated by •). 
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Ai^i(u) and Bi^i(u) in this way, we ensure that all A values in the raked tree are identical 
to the corresponding values in the original tree. This is not yet enough, because we must 
check that vr values are similarly preserved. The only two values that could possibly change 
are 1^(2) and 7r(f ), so we check them both. For the former, we must have 

7t(z) = D,(z)-(7T(x)*C\(z)-X(e)) 

= D,+^iz) ■ iTriu) * Q+riz) ■ Xiv)) . 

After substituting for 7r(a;) and some algebraic manipulation, we see that this is assured if 
Ci-^i(z) = Ci(x) and Di^i(z) = Di(z) ■ DiagQ^^^^y^^-^ ■ Di(x). However recall that, by defini- 
tion, Ci-^i(z) = Ai^i(u) and Ci(x) = Ai(u), and so Ci-^i(z) = Ci(x) follows. Furthermore, 

= {B,{u) ■ Diag^^(^).;,(,) • B,{x)y 
= B,{x f ■ Diag^^(^).;,(,) • B,{uy 

= D,{z)-T)\cigc,(zy\(e)- B>^{x) 

as required. 

For 7r(f ) it is necessary to verify that 

n{v) = D,(v) ■ (7t(u) * C\(v) ■ X(x)) 

= D,+^iv) ■ iTriu) * Q+riv) ■ Xiz)) . 

By substituting for X(x), this can be shown to be true if Di^i(v) = Di(v) = Ai(uY = 
Ai^i(uY and Ci-^i(v) = Ci(v) ■ Diag^^(^j.;^(g-| • Bi(x) = Bi^i(u). But these identities follow 
by definition, so we are done. 

Beginning with the given tree T = Tq, each successive tree is constructed by performing 
a sequence of rakes, so as to rake away about half of the remaining evidence nodes. More 
specifically, let Contract be the operation in which we apply the Rake operation to every 
other leaf of a causal tree, in left-to-right order, excluding the leftmost and the rightmost 
leaf. Let {Ti} be the set of causal trees constructed so that Tj+i is the causal tree generated 
from Ti by a single application of Contract. The following result is proved using an easy 
inductive argument: 

Theorem 3: Let Tq be a causal tree of size N . Then the number of leaves in Tj+i is equal 
to half the leaves in Ti (not counting the two extreme leaves) so that starting with Tq, 
after O(logiV) applications of Contract, we produce a three-node tree: the root, the 
leftmost leaf and the rightmost leaf. 

Below are a few observations about this process: 

1. The complexity of Contract is linear in the size of the tree. Additionally, log N ap- 
plications of Contract reduce the set of tree equations to a single equation involving 
the root in 0(N) total time. 

2. The total space to store all the sets of equations associated with {Ti}o<i<iogN is about 
twice the space required to store the equations for Tq. 
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3. With each equation in Tj+i we also store equations that describe the relationship 
between the conditional probability matrices in Tj+i to the matrices in Ti. Notice 
that, even though Tj+i is produced from Ti by a series of rake operations, each matrix 
in Tj+i depends directly on matrices present in Ti. This would not be the case if we 
attempted to simultaneously rake adjacent children. 

We regard these equations as part of Tj+i. So, formally speaking {Tj} are causal trees 
augmented with some auxiliary equations. Each of the contracted trees describes a 
probability distribution on a subset of the first set of variables that is consistent with 
the original distribution. 

We note that the ideas behind the Rake operation were originally developed by Miller 
and Reif (1985) in the context of parallel computation of bottom-up arithmetic expression 
trees (Kosaraju & Delcher, 1988; Karp & Ramachandran, 1990). In contrast, we are using 
it in the context of incremental update and query operations in sequential computing. A 
similar data structure to ours was independently proposed by Frederickson (1993) in the 
context of dynamic arithmetic expression trees, and a different approach for incremental 
computing on arithmetic trees was developed by Cohen and Tamassia (1991). There are 
important and interesting differences between the arithmetic expression-tree case and our 
own. For arithmetic expressions all computation is done bottom-up. However, in probabilis- 
tic networks vr-messages must be passed top-down. Furthermore, in arithmetic expressions 
when two algebraic operations are allowed, we typically require the distributivity of one 
operation over the other, but the analogous property does not hold for us. In these re- 
spects our approach is a substantial generalization of the previous work, while remaining 
conceptually simple and practical. 

3. Example: A Chain 

To obtain an intuition about the algorithms, we sketch how to generate and utilize the 
Ti, < i < logiV and their equations to perform A-value queries and updates in O(logiV) 
time on an iV = 2i -|- 1 node chain of length L. Consider the chain of length 4 in Figure 3, 
and the trees that are generated by repeated application of Contract to the chain. 

The equations that correspond to the contracted trees in the figure are as follows (ig- 
noring trivial equations). Recall that Ai{xj) is the matrix associated with the left edge of 
random variable Xj in Ti. 



\{X2) 

\{x:i) 

\{Xi) 



Ao{xi) ■ A(ei) * Bo{xi) ■ \{x2) 
Aq{x2) ■ A(e2) * Bq{x2) ■ Mx^) 
Ao{x^) ■ \{e^) * Bq{x^) ■ \{x^) 
Aq{x^) ■ \{e^) * Bq{x^) ■ A(e5) 



1 



for T 







\{x:i) 



Ai{xi) ■ A(ei) * Bi{xi) ■ A(a;3) 
Ai{x^) ■ A(e3) * Bi{x^) ■ A(e5) 



where 



> forTi 



Bi{x^) 
Biixs) 



Bo(xi) ■ Diag^|j(^2)-A(e2) 

■Bo(x2) 

Boixs) ■ Diag^^(^^).;,(,^) • Boixi) ^ 
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Figure 3: A simple chain example. 

A(a;i) = A2{xi) ■ X{ei) * B2{xi) ■ X{e5) ^ 

where > for T2 

B2{xi) = Bi{xi) -Drng^^^^^y^^^y Bi{x3) J 

We have not listed the A matrices because, in this example, they are constant. Now 
consider a query operation on X2. Rather than performing the standard computation we 
will find the level where X2 was "raked". Since this occurred on level 0, we obtain the 
equation 

X(x2) = Ao(x2) ■ Xie2) * Bo(x2) ■ Xixs) 

Thus we must compute A(a;3), and to do this we find where x^ is "raked". That happened 
on level 1. However, on that level the equation associated with x^ is: 

A(a;3) = Ai{x3) ■ X{e3) * Bi{x3) ■ X{e5) 

That means that we need not follow down the chain. In general for a chain of N nodes we 
can answer any query to a node on the chain by evaluating log N equations instead of N 
equations. 

Now consider an update for 64. Since 64 was raked immediately, we first modify the 
equation 

Biix3) = Bo{x3) ■ Diag^^(^^).;,(,^) • Boix^) 

on the first level where 64 occurs on the right-hand side. Since Bi{x3) is affected by the 
change to 64, we subsequently modify the equation 

B2{xi) = Biixi) ■ Diag^^(^3).;,(,3) • Bi{x3) 
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on the second level. In general, we clearly need to update at most logiV equations; i.e., one 
per level. We now generalize this example and describe general algorithms for queries and 
updates in causal trees. 

3.1 Performing Queries And Updates Efficiently 

In this section we shall show how to utilize the contracted trees Ti, < i < logiV to 
perform queries and updates in O(logiV) time in general causal trees. We shall show that a 
logarithmic amount of work will be necessary and sufficient to compute enough information 
in our data structure to update and query any A or vr value. 

3.2 A Queries 

To compute A(a;) for some node x we can do the following. We first locate ind (x), which is 
defined to be the highest level i such that x appears in Ti. The equation for A(a;) is of the 
form: 

\{x) = Mx) ■ \{y) * B,{x) ■ \{z) 

where y and z are the left and right children, respectively, of x in Ti. 

Since x does not appear in Tj+i, it was raked at this level of equations, which implies 
that one child (we assume z) is a leaf. We therefore only need to compute A(y), which can 
be done recursively. If instead y was the raked leaf, we would compute X(z) recursively. 

In either case 0(1) operations are done in addition to one recursive call, which is to a 
value at a higher level of equations. Since there are O(logiV) levels, and the only operations 
are matrix by vector multiplications, the procedure takes 0(A;^logiV) time. The function 
A- Query (x) is given in Figure 4. 

3.3 Updates 

We now describe how the update operations can modify enough information in the data 
structure to allow us to query the A vectors and vr vectors efficiently. Most importantly the 
reader should note that the update operation does not try to maintain the correct vr and 
A values. It is sufficient to ensure that, for all i and x, the matrices Ai(x) and Bi(x) (and 
thus also Ci(x) and Di(x)) are always up to date. 

When we update the value of an evidence node, we are simply changing the A value of 
some leaf e. At each level of equations, the value of A(e) can appear at most twice: once 
in the A-equation of e's parent and once in the vr-equation of e's sibling in Ti. When e 
disappears, say at level i, its value is incorporated into one of the constant matrices Ai^i(u) 
or Bi^i(u) where u is the grandparent of e in Ti. This constant matrix in turn affects 
exactly one constant matrix in the next higher level, and so on. Since the effect at each 
level can be computed in 0{k^) time (due to matrix multiplication) and there are O(log A) 
levels of equations, the update can be accomplished in 0(A;"^logA) time. The constant 
is actually pessimistic, because faster matrix multiplication algorithms exist. 

The update procedure is given in Figure 5. Update is initially called as Update(A(_E) = 
e,i) where _E is a leaf, i the level at which it was raked, and e is the new evidence. This 
operation will start a sequence of O(log A) calls to function A-UPDATE (A = Term, i) as 
the change will propagate to log A equations. 
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FUNCTION A-QuERY (x) 

We look up the equation associated with A(a;) in T^^^f^^y 

Case 1: a; is a leaf. Then the equation is of the form: A(a;) = e where e is known. In 
this case we return e. 

Case 2: The equation associated with A(a;) is of the form 

\{x) = A,{x)-\{y)*B,{x)-\{z) 
where 2; is a leaf and therefore \{z) is known. In this case we return 

A,{X) ■ A-QuERY {y) * B,{X) ■ \{z) 

The case where y is the leaf is analogous. 



Figure 4: Function to compute the A value of a node. 

3.4 TT Queries 

It is relatively easy to use a similar recursive procedure to perform 7r(a;) queries. Unfor- 
tunately, this approach yields an 0(log^iV)-time algorithm if we simply use recursion to 
calculate vr terms and calculate A terms using our earlier procedure. This is because there 
will be O(logiV) recursive calls to calculate vr values, but each is defined by an equation 
that also involves a A term taking O(logiV) time to compute. 

To achieve O(logiV) time, we shall instead implement 7r(a;) queries by defining a proce- 
dure CalcttA (s, i) which returns a triple of vectors (P, i, R) such that P = 7r(a;), L = X(y) 
and R = X(z) where y and z are the left and right children, respectively, of x in Ti. 

To compute 7r(a;) for some node x we can do the following. Let i = ind (x). The equation 
for 7r(a;) in Ti is of the form: 

Tr(x) = Di(x) ■ (Tr(u) * Ci(x) ■ A(f )) 

where u is the parent of x in Ti and v its sibling. We then call procedure CalcttA (m, i+ 1) 
which will return the triple (vr(M), A(f ), A(a;)), from which we immediately can compute 7r(a;) 
using the above equation. 

Procedure CalcttA (s, i) can be implemented in the following fashion. 

Case 1: If Ti is a 3-node tree with x as its root, then both children of x are leaves, hence 
their A values are known, and 7r(a;) is a given sequence of prior probabilities for x. 

Case 2: If x does not appear in Ti+i, then one of s's children is a leaf, say e which is raked 
at level i. Let z be the other child. We call CalcttA (m, i -|- 1), where u is the parent of 
X in Ti, and receive back {irlu), X(z), X(v)) or {irlu), X(v), X(z)) according to whether x 
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FUNCTION A-Update (Term = Value, i) 

1. Find the (at most one) equation in Ti, defining some Ai or Bi, in which Term 
appears on the right-hand side; let Term' be the matrix defined by this equation 
(i.e., its left-hand side). 

2. Update Term'; let Value be the new value. 

3. Call A-Update (Term' = Value, i+ 1) recursively. 



was a left or right child of u in Ti (and v is m's other child). We can now compute 'it(x) 
from 'it(u) and A(f ), and we have A(e) and X(z), so we can return the necessary triple. 

Specifically, 



where the choice depends on whether x is the right or left child, respectively, of u in Ti. 

Case 3: If x does appear in Tj+i, then we call CalcttA (x, i + 1). This returns the correct 
value of 'it(x). For any child 2; of a; in Ti that remains a child of x in Tj+i, it also returns 
the correct value of X(z). If 2; is a child of x that does not occur in Tj+i, then it must be 
the case that z was raked at level i so that one of z^s children, say e, is a leaf and let the 
other child be q. In this situation CalcttA (s, i -|- 1) has returned the value of X(q) and 
we can compute 



and return this value. 

In all three cases, there is a constant amount of work done in addition to a single recursive 
call that uses equations at a higher level. Since there are O(logiV) levels of equations, each 
requiring only matrix by vector multiplication, the total work done is 0(A;^logiV). 

4. Extended Example 

In this section we illustrate the application of our algorithms to a specific example. Consider 
the sequence of contracted trees shown in Figure 6. Corresponding to these trees we have 



Figure 5: The update procedure. 




D,(x) ■ (Tr(u) * A,+i(u) ■ X(v)) 
D,(x) ■ (Tr(u) * B,+i(u) ■ X(v)) 



X(z) = A,(z) ■ X(e) * B,(z) ■ X(q) 
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such equations as the following: 



For Tr 







X{xi) = Ao{xi) ■ X{X2) * Bo{xi) ■ X{X3) Tr{x2) = Do{x2) ■ {Tr{xi) * Co{x2) ■ X{X3)) 



For Ti: 

X(xi) = Ai(xi) ■ X(x2) * Bi(xi) ■ X(e9) 'it(x2) = Di(x2) ■ (iTixi) * Ci(x2) ■ X(e9)) 



For T2: 

X{xi) = A2{xi) ■ X{x4) * B2ixi) ■ Xleg) irlxi) = _D2(«4) • (7r(a;i) * C2(a;4) • ^(eg)) 



For T3: 

X(xi) = Asixi) ■ X(ei) * Bsixi) ■ X(e9) 

Now consider, for instance, the effect of an update for 62- Since it is raked immediately, 
the new value of A(e2) is incorporated in: 



From subsequent Rake operations we know that A2(x4) depends on Bi(xe), and Aslxi) 
depends on A2(x4), so we must also update these values as follows: 



Finally, consider a query for x^. Since xi is raked together with 65 in Tq, we follow 
the steps outlined above and generate the following calls: CalcttA (17, 0), CalcttA (14, 1), 
CalcttA (14, 2), and CalcttA (si, 3). This provides us with irlxi). In this case, X(xi) 
is particularly easy to compute since both x^^s children are leaf nodes. Then we simply 
compute ttIxi) * X(xi) and then normalize, giving us the conditional marginal distribution 
Be] (xi) as required. 

5. Join Trees 

Perhaps the best-known technique for computing with arbitrary (i.e., not singly-connected) 
Bayesian networks uses the idea of join trees (junction trees) (Lauritzen & Spiegelhalter, 
1988). In many ways a join tree can be thought of as a causal tree, albeit one with somewhat 
special structure. Thus the algorithm in the previous section can be applied. However, the 
structure of a join tree permits some optimization, which we describe in this section. This 
becomes especially relevant in the next section, where we use the join-tree technique to 
show how O(logiV) updates and queries can be done for arbitrary polytrees. Our review 
of join-trees and their utility is extremely brief and quite incomplete; for clear expositions 
see, for instance, Spiegelhalter et al. (1993) and Pearl (1988). 

Given any Bayesian network, the first step towards constructing a join-tree is to moralize 
the network: insert edges between every pair of parents of a common node, and then treat all 



Bi{xe) 



^2(^4) 
Asixi) 



Ai{x4) ■ Biagg^(^^y^(^^^ ■ Ai{xe) 
A2{xi) ■ DiagB^(^^)^;,(,^) • A2{x4) 
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edges in the graph as being undirected (Spiegelhalter et al., 1993). The resulting undirected 
graph is called the moral graph. We are interested in undirected graphs that are chordal: 
every cycle of length 4 or more should contain a chord (i.e., an edge between two nodes 
that are non- adjacent in the cycle). If the moral graph is not chordal, it is necessary to add 
edges to make it so; various techniques for this triangulation stage are known (for instance, 
see Spiegelhalter et al., 1993). 

If p is a probability distribution represented in a Bayesian network G = {V,E), and 
M = {V, F) is the result of moralizing and then triangulating G, then: 

1. M has at most \V\ cliques, say Ci, . . . , C\v\- 

2. The cliques can be ordered so that for each i > 1 there is some < i such that 

c, n = n (Ci u C2 u . . . u c,_i.) 

The tree T formed by treating the cliques as nodes, and connecting each node Ci to 
its "parent" Cj(j-j, is called a join tree. 

3. p = l[piQ\C)^,)) 

i 

4. p{c\\C)^,)) = p{c\\C)^,) n c\) 

From 2 and 3, we see that if we direct the edges in T away from the "parent" cliques, 
the resulting directed tree is in fact a Bayesian causal tree that can represent the original 
distribution p. This is true no matter what the form of the original graph. Of course, the 
price is that the cliques may be large, and so the domain size (the number of possible values 
of a clique node) can be of exponential size. This is why this technique is not guaranteed 
to be efficient. 

We can use the Rake technique of Section 2 on the directed join tree without any 
modification. However, property 4 above shows that the conditional probability matrices 
in the join tree have a special structure. We can use this to gain some efficiency. In the 
following, let k be the domain size of the variables in G as usual. Let n be the maximum 
size of cliques in the join tree; without loss of generality we can assume that all cliques are 
of the same size (because we can add "dummy" variables). Thus the domain size of each 
clique is K = k^. Finally, let c be the maximum intersection size of a clique and its parent 
(i.e., |Cj(j-) n Ci\) and L = k'^. 

In the standard algorithm, we would represent p(Ci\Cj(^i-j) as a K X K matrix, Mc^^c^^^y 
However, piCi\Cj(^i-j fl Ci) can be represented as a smaller L X K matrix, Mc^^c^^^.^f^c^. By 
property 4 above, Mq^^q^^^^^^ is identical to MQ<^^Q<_^^^^^f~,Q<^, except that many rows are repeated. 
Thus there is a K X L matrix / such that 

( J is actually a simple matrix whose entries are and 1, with exactly one 1 per row; however 
we do not use this fact.) 

4. A clique is a maximal completely-connected subgraph. 
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Our claim is that, in the case of join trees, the following is true. First, the matrices 
Ai and Bi used in the Rake algorithm can be stored in factored form, as the product of 
two matrices of dimension K X L and L X K respectively. So, for instance, we factor Ai 
as a' • A^-. We never need to explicitly compute, or store, the full matrices. As we have 
just seen, this claim is true when i = because the M matrices factor this way. The proof 
for i > 1 uses an inductive argument, which we illustrate below. The second claim is that, 
when the matrices are stored in factored form, all the matrix multiplications used in the 
Rake algorithm are of one of the following types: 1) an i X K matrix times a K xL matrix, 
2) an i X K matrix times a K X K diagonal matrix, 3) an i X i matrix times an i X K 
matrix, or 4) an i X K matrix times a vector. 

To prove these claims consider, for instance, the equation defining -Bi+i in terms of lower- 
level matrices. From Section 2, Bi^i(u) = Bi{u) • Diag^^(^j.;^(g-| ■ Bi(x). But, by assumption, 
this is: 

{Bi{u) ■ Bliu)) . Diag(^,(,).^.(,)).,(,) • {B\{x) ■ B\{x)), 
which, using associativity, is clearly equivalent to 

B\{u) . [miu) . Diag^,(,).(^.(,).,(,))) • B\{x)) • B\{x)\ . 

However, every multiplication in this expression is one of the forms stated earlier. Identifying 
_B'_^i(m) as Bl( u) and BJ_^-^^(u) as the bracketed part of the expression proves this case, and 
of course the case where we rake a left child (so that Ai^i(u) is updated) is analogous. 
Thus, even using the most straightforward technique for matrix multiplication, the cost of 
updating -Bi+i is O(KL^) = 0(A;"+^^). This contrasts with OiK"^) if we do not factor the 
matrices, and may represent a worthwhile speedup if c is small. Note that the overall time 
for an update using this scheme is 0(A;""'"^^ log iV). Queries, which only involve matrix by 
vector multiplication, require 0(A;""'"^ log iV) time. 

For many join trees the difference between N and logiV is unimportant, because the 
clique domain size K is often enormous and dominates the complexity. Indeed, K and L 
may be so large that we cannot represent the required matrices explicitly. Of course, in such 
cases our technique has little to offer. But there will be other cases in which the benefits 
will be worthwhile. The most important general class in which this is so, and our immediate 
reason for presenting the technique for join trees, is the case of polytrees. 

6. Polytrees 

A polytree is a singly connected Bayesian network; we drop the assumption of Section 2 
that each node has at most one parent. Polytrees offer much more fiexibility than causal 
trees, and yet there is a well-known process that can update and query in 0(N) time, just 
as for causal trees. For this reason polytrees are an extremely popular class of networks. 

We suspect that it is possible to present an O(log A) algorithm for updates and queries 
in polytrees, as a direct extension of the ideas in Section 2. Instead we propose a different 
technique, which involves converting a polytree to its join tree and then using the ideas of 
the preceding section. The basis for this is the simple observation that the join tree of a 
polytree is already chordal. Thus (as we show in detail below) little is lost by considering 
the join tree instead of the original polytree. The specific property of polytrees that we 
require is the following. We omit the proof of this well-known proposition. 
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Proposition 4: If T is the moral graph of a polytree P = {V, E) then T is chordal, and 
the set of maximal cliques in T is {{f } U parents(v) : v G V}. 

Let p be the maximum number of parents of any node. From the proposition, every 
maximal clique in the join tree has at most p + 1 variables, and so the domain size of a node 
in the join tree is K = k^'^^ . This may be large, but recall that the conditional probability 
matrix in the original polytree, for a variable with p parents, has K entries anyway since we 
must give the conditional distribution for every combination of the node's parents. Thus K 
is really a measure of the size of the polytree itself. 

It now follows from the proposition above that we can perform query and update in 
polytrees in time 0(K'^log N), simply by using the algorithm of Section 2 on the directed 
join tree. But, as noted in Section 5, we can do better. Recall that the savings depend on 
c, the maximum size of the intersection between any node and its parent in the join tree. 
However, when the join tree is formed from a polytree, no two cliques can share more than a 
single node. This follows immediately from Proposition 4, for if two cliques have more than 
one node in common then there must be either two nodes that share more than one parent, 
or else a node and one of its parents that both share yet another parent. Neither of these is 
consistent with the network being a polytree. Thus in the complexity bounds of Section 5, 
we can put c = 1. It follows that we can process updates in 0(Kk^'^log N) = 0{k'^'^^ log N) 
time and queries in 0(A;*'"'"^ log iV). 

7. Application: Towards Automated Site-Specific Muta-Genesis 

An experiment which is commonly performed in biology laboratories is a procedure where 
a particular site in a protein is changed {i.e., a single amino- acid is mutated) and then 
tested to see whether the protein settles into a different conformation. In many cases, with 
overwhelming probability the protein does not change its secondary structure outside the 
mutated region. This process is often called muta-genesis. Delcher et al. (1993) developed a 
probabilistic model of a protein structure which is basically a long chain. The length of the 
chain varies between 300-500 nodes. The nodes in the network are either protein-structure 
nodes (PS-nodes) or evidence nodes (E-nodes). Each PS-node in the network is a discrete 
random variable Xi that assumes values corresponding to descriptors of secondary sequence 
structure: helix, sheet or coil. With each PS-node the model associates an evidence node 
that corresponds to an occurrence of a particular subsequence of amino acids at a particular 
location in the protein. 

In our model, protein-structure nodes are finite strings over the alphabet {ii, e, c}. For 
example the string hhhhhh is a string of six residues in an a-helical conformation, while 
eecc is a string of two residues in a /3-sheet conformation followed by two residues folded as 
a coil. Evidence nodes are nodes that contain information about a particular region of the 
protein. Thus, the main idea is to represent physical and statistical rules in the form of a 
probabilistic network. 

In our first set of experiments we converged on the following model that, while clearly 
biologically naive, seems to match in prediction accuracy many existing approaches such as 
neural networks. The network looks like a set of PS-nodes connected as a chain. To each 
such node we connect a single evidence node. In our experiments the PS-nodes are strings 
of length two or three over the alphabet {ii, e, c} and the evidence nodes are strings of the 
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Figure 7: Example of causal tree model using pairs, showing protein segment GSAT with 
corresponding secondary structure cchh. 



same length over the set of amino acids. The following example clarifies our representation. 
Assume we have a string of amino acids GSAT. We model the string as a network comprised 
of three evidence nodes OS, SA, AT and three PS-nodes. The network is shown in Figure 7. 
A correct prediction will assign the values cc, cii, and hh to the PS-nodes as shown in the 
figure. 

Now that we have a probabilistic model, we can test the robustness of the protein or 
whether small changes in the protein affect the structure of certain critical sites in the 
protein. In our experiments, the probabilistic network performs a "simulated evolution" of 
the protein, namely the simulator repeatedly mutates a region in the chain and then tests 
whether some designated sites in the protein that are coiled into a helix are predicted to 
remain in this conformation. The main goal of the experiment was to test if stable bonds far 
away from the mutated location were affected. Our previous results (Delcher et al., 1993) 
support the current thesis in the biology community, namely that local distant changes 
rarely affect structure. 

The algorithms we presented in the previous sections of the paper are perfectly suited 
for this type of application and are predicted to generate a factor of 10 improvement in 
efficiency over the current brute-force implementation presented by Delcher et al. (1993) 
where each change is propagated throughout the network. 

8. Summary 

This paper has proposed several new algorithms that yield a substantial improvement in the 
performance of probabilistic networks in the form of causal trees. Our updating procedures 
absorb sufficient information in the tree such that our query procedure can compute the 
correct probability distribution of any node given the current evidence. In addition, all 
procedures execute in time O(log A), where A is the size of the network. Our algorithms 
are expected to generate orders-of-magnitude speed-ups for causal trees that contain long 
paths (not necessarily chains) and for which the matrices of conditional probabilities are 
relatively small. We are currently experimenting with our approach with singly connected 
networks (polytrees). It is likely to be more difficult to generalize the techniques to general 
networks. Since it is known that the general problem of inference in probabilistic networks is 
AT'-hard (Cooper, 1990), it obviously is not possible to obtain polynomial-time incremental 
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solutions of the type discussed in this paper for general probabilistic networks. The other 
natural open question is extending the approach developed in this paper to other dynamic 
operations on probabilistic networks such as addition and deletion of nodes and modifying 
the matrices of conditional probabilities (as a result of learning). 

It would also be interesting to investigate the practical logarithmic-time parallel algo- 
rithms for probabilistic networks on realistic parallel models of computation. One of the 
main goals of massively parallel AI research is to produce networks that perform real-time 
inference over large knowledge-bases very efficiently (i.e., in time proportional to the depth 
of the network rather than the size of the network) by exploiting massive parallelism. Jerry 
Feldman pioneered this philosophy in the context of neural architectures (see StanfiU and 
Waltz, 1986, Shastri, 1993, and Feldman and Ballard, 1982). To achieve this type of per- 
formance in the neural network framework, we typically postulate a parallel hardware that 
associates a processor with each node in a network and typically ignores communication re- 
quirements. With careful mapping to parallel architectures one can indeed achieve efficient 
parallel execution of specific classes of inference operations (see Mani and Shastri, 1994, 
Kasif, 1990, and Kasif and Delcher, 1992). The techniques outlined in this paper presented 
an alternative architecture that supports very fast (sub-linear time) response capability on 
sequential machines based on preprocessing. However, our approach is obviously limited to 
applications where the number of updates and queries at any time is constant. One would 
naturally hope to develop parallel computers that support real-time probabilistic reasoning 
for general networks. 
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